Mining Massive Earth Science Data Sets for Large Scale Structure
نویسندگان
چکیده
Abstract—The traditional way to look for large scale structure in very large observational or model generated data sets is to examine maps of means and standard deviations of parameters of interest on a coarse spatio-temporal grid. This approach is popular because it is easy to implement and understand, but unfortunately it throws away almost all of the distributional information in the data. Moreover, maps are computed for individual parameters of interest, and therefore do not retain information about relationships among two or more parameters. In this work, we use a modified data compression algorithm to produce multivariate distribution estimates for each grid cell. The algorithms optimally mediates between data reduction and fidelity loss using information-theoretic principles. Changes in these distribution estimates over time, space and resolution reflect large scale data structure. This is the basis for a data mining algorithm that characterizes those changes using a pseudo-metric for the distance between distributions. We demonstrate using data from the Atmospheric Infrared Sounder (AIRS) on board NASA’s Aqua satellite.
منابع مشابه
Shaking up Seismology: Data Mining for Earthquake Detection
Seismic sensors collect massive quantities of data that contain a wealth of information about processes within the earth. Seismologists are increasingly adopting data mining and machine learning techniques to identify previously unknown earthquakes in large seismic data sets. Our new earthquake detection method, Fingerprint and Similarity Thresholding (FAST), enables waveform-similarity-based e...
متن کاملKnowledge Discovery from Disparate Earth Data Sources
Advances in data collection and data storage technologies have made it possible to acquire massive Earth science data sets. In principle, these data sets could be transformed into great scientific discoveries. However, due to the heterogeneous nature and to the scale of the available Earth science data, traditional analysis methods are challenged and much of these data remain largely unexplored...
متن کاملEecient Techniques for Range Search Queries on Earth Science Data
We consider the problem of organizing large scale earth science raster data to ef ciently handle queries for identifying regions whose parameters fall within certain range values speci ed by the queries This problem seems to be critical to enabling basic data mining tasks such as determining associations between physical phenomena and spatial factors detecting changes and trends and content bas...
متن کاملLand Cover Change Detection using Data Mining Techniques
The study of land cover change is an important problem in the Earth science domain because of its impacts on local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species. Data mining and knowledge discovery techniques can aid this effort by efficiently discovering patterns that capture complex interactions between ocean temperature, air pr...
متن کاملKnowledge Discovery From Global Remote Sensing and Climate Data: Results from Supervised and Unsupervised Data Mining
This paper describes results and lessons learned from research activities designed to develop data mining and machine learning methods for remote sensing and Earth science data sets. These data sets are acquired by Earth observing instruments onboard polar orbiting satellites, in-situ observations, and model reanalysis and provide a rich source of information related to the properties and dynam...
متن کامل